
\documentclass[11pt]{article} % use larger type; default would be 10pt
\usepackage{framed}
\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\usepackage{geometry} % to change the page dimensions
\geometry{a4paper} % or letterpaper (US) or a5paper or....
% \geometry{margin=2in} % for example, change the margins to 2 inches all round
% \geometry{landscape} % set up the page for landscape
%   read geometry.pdf for detailed page layout information

\usepackage{graphicx} % support the \includegraphics command and options

% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent

%%% PACKAGES
\usepackage{booktabs} % for much better looking tables
\usepackage{array} % for better arrays (eg matrices) in maths
\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float
% These packages are all incorporated in the memoir class to one degree or another...
\usepackage{framed}

%%% HEADERS & FOOTERS
\usepackage{fancyhdr} % This should be set AFTER setting up the page geometry
\pagestyle{fancy} % options: empty , plain , fancy
\renewcommand{\headrulewidth}{0pt} % customise the layout...
\lhead{}\chead{}\rhead{}
\lfoot{}\cfoot{\thepage}\rfoot{}

%%% SECTION TITLE APPEARANCE
\usepackage{sectsty}
\allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help)
% (This matches ConTeXt defaults)

%%% ToC (table of contents) APPEARANCE
\usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC
\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents
\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape}
\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold!
\begin{document}
%=================================================================%

%ML Week 6
\section*{Revision}
\subsection*{High variance}

\begin{itemize}
	\item indicated by gap in errors between training and testing data sets.
	\item Algorithm has overfit the data for the training data set.
	\item Increasing the regularization parameter will reduce overfitting
\end{itemize}
The recommended way to choose a value of regularization parameter $\lambda$ to use is to choose
the lowest cross validation error.
You should not use the training data set for this purpose.

\newpage
\section*{Week 6: Advice for Applying Machine Learning}

\subsection*{Question 1. }
You train a learning algorithm, and find that it has unacceptably high error on the test set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering from high bias, high variance, or neither?

\begin{itemize}
	\item[(i)] Neither
	
	\item[(ii)] High variance
	
	\item[(iii)] High bias [CORRECT]
\end{itemize}


%-----------------------------------------------------------%
\subsection*{Question 2. }
Suppose you have implemented regularized logistic regression  to classify what object is in an image (i.e., to do object recognition). However, when you test your hypothesis on a new set of images, you find that it makes unacceptably large errors with its predictions on the new images. However, your hypothesis performs well (has low error) on the training set. Which of the following are promising steps to take? Check all that apply.

\begin{itemize}
\item[(i)] 
SELECTED Try increasing the regularization parameter $\lambda$.
\item[(ii)] 
WRONG Try evaluating the hypothesis on a cross validation set rather than the test set.
\item[(iii)] 
CORRECT Try using a smaller set of features.
\item[(iv)] WRONG
Try decreasing the regularization parameter $\lambda$.
\item 
SELECTED Get more training examples.


\end{itemize}
%-----------------------------------------------------------%
\subsection*{Question 3. }
Suppose you have implemented regularized logistic regression to predict what items customers will purchase on a web shopping site. However, when you test your hypothesis on a new set of customers, you find that it makes unacceptably large errors in its predictions. Furthermore, the hypothesis performs poorly on the training set. Which of the following might be promising steps to take? Check all that apply.

\begin{itemize}
\item[(i)] SELECTED Try decreasing the regularization parameter $\lambda$.

\item[(ii)] WRONG Use fewer training examples.

\item[(iii)] WRONG Try evaluating the hypothesis on a cross validation set rather than the test set.

\item[(iv)] CORRECT Try adding polynomial features.
\end{itemize}
%=====================================================================================%
\subsection*{Question 4. }
Which of the following statements are true? Check all that apply.

\begin{itemize}
	\item[(i)]
WRONG Suppose you are training a regularized linear regression model. The recommended way to choose what value of regularization parameter $\lambda$ to use is to choose the value of $\lambda$ which gives the lowest test set error.
	\item[(ii)]
CORRECT Suppose you are training a regularized linear regression model. The recommended way to choose what value of regularization parameter $\lambda$ to use is to choose the value of $\lambda$ which gives the lowest cross validation error.
	\item[(iii)]
CORRECT The performance of a learning algorithm on the training set will typically be better than its performance on the test set.
	\item[(iv)]
WRONG Suppose you are training a regularized linear regression model.The recommended way to choose what value of regularization parameter $\lambda$ to use is to choose the value of $\lambda$ which gives the lowest training set error.
\item[(iv)] CORRECT A typical split of a dataset into training, validation and test sets might be 60\% training set, 20\% validation set, and 20\% test set.
\item WRONG It is okay to use data from the test set to choose the regularization parameter $\lambda$, but not the model parameters ($\theta$).
\item WRONG Suppose you are training a logistic regression classifier using polynomial features and want to select what degree polynomial (denoted d in the lecture videos) to use. After training the classifier on the entire training set, you decide to use a subset of the training examples as a validation set. This will work just as well as having a validation set that is separate (disjoint) from the training set.

\item CORRECT Suppose you are using linear regression to predict housing prices, and your dataset comes sorted in order of increasing sizes of houses. It is then important to randomly shuffle the dataset before splitting it into training, validation and test sets, so that we don’t have all the smallest houses going into the training set, and all the largest houses going into the test set.
\end{itemize}

%-----------------------------------------------------------%
\subsection*{Question 5. }
Which of the following statements are true? Check all that apply.

\begin{itemize}
	\item[(i)]
CORRECT A model with more parameters is more prone to overfitting and typically has higher variance. 
	\item[(ii)]
WRONG If the training and test errors are about the same, adding more features will not help improve the results. 
	\item[(iii)]
CORRECT If a learning algorithm is suffering from high variance, adding more training examples is likely to improve the test error.
	\item[(iv)]
CORRECT If a learning algorithm is suffering from high bias, only adding more training examples may not improve the test error significantly.
\end{itemize}
%=================================================================%
\newpage
\section*{Week 6: Machine Learning System Design}

\subsection*{Question 1} 
You are working on a spam classification system using regularized logistic regression. ``\textit{Spam}" is a positive class (y = 1) and ``\textit{not spam}" is the negative class (y = 0). You have trained your classifier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:

\begin{center}
\begin{tabular}{|c|c|c|}\hline 
	& Actual Class: 1	& Actual Class: 0 \\ \hline
	Predicted Class: 1 &	85&	890 \\  \hline  
	Predicted Class: 0 &	15&	10\\ \hline
\end{tabular} 
\end{center}


\begin{framed}
For reference:

\begin{itemize}
	\item Accuracy = (true positives + true negatives) / (total examples)
	\item Precision = (true positives) / (true positives + false positives)
	\item Recall = (true positives) / (true positives + false negatives)
	\item F1 score = (2 $\times$ precision $\times$ recall) / (precision + recall)
\end{itemize}

\end{framed}
What is the classifier's precision (as a value from 0 to 1)?

Enter your answer in the box below. If necessary, provide at least two values after the decimal point.

0.09
%----------------------------------------------------------%
\subsection{Question 1} 
CORRECT
Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true. Which are the two?

\begin{itemize}
	\item[(i)] WRONG When we are willing to include high order polynomial features of x (such as $x^2_1$, $x^2_2$, $x_1x_2$, etc.).
	
\item[(ii)]	SELECTED The features x contain sufficient information to predict y accurately. (For example, one way to verify this is if a human expert on the domain can confidently predict y when given only x).
	
	
\item[(iii)] WRONG  We train a learning algorithm with a 	large number of parameters (that is able to learn/represent fairly complex functions).
	
	
	
\item[(iv)] 	We train a learning algorithm with a small number of parameters (that is thus unlikely to overfit).
\end{itemize}



%----------------------------------------------------------%
\subsection{Question 3.}

Suppose you have trained a logistic regression classifier which is outputing $h\theta(x)$. Currently, you predict 1 if $h\theta(x) \geq $threshold, and predict 0 if $h\theta(x)$ < threshold, where currently the threshold is set to 0.5. Suppose you decrease the threshold to 0.1. Which of the following are true? Check all that apply.

\begin{itemize}
	\item[(i)] The classifier is likely to have unchanged precision and recall, but lower accuracy.
	
\item[(ii)] 	The classifier is likely to now have higher precision.
	
\item[(iii)] 	SELECTED The classifier is likely to now have higher recall.
	
\item[(iv)]	The classifier is likely to have unchanged precision and recall, but higher accuracy.
	
	%----------------------%
	
\item[(v)] 	The classifier is likely to have unchanged precision and recall, and thus the same F1 score.
	
\item[(vi)] 	The classifier is likely to have unchanged precision and recall, but	higher accuracy.
	
	
\item[(vii)] 	SELECTED The classifier is likely to now have lower precision.
	
\item[(viii)]	The classifier is likely to now have lower recall.
\end{itemize}

%---------------------------------------------------------%
%----------------------------------------------------------%
\subsection*{Question 4.} WRONG
Suppose you are working on a spam classifier, where spam emails are positive examples (y=1) and non-spam emails are
negative examples (y=0). You have a training set of emails in which 99\% of the emails are non-spam and the other 1\% is
spam. Which of the following statements are true? Check all that apply.

\begin{itemize}
	\item WRONG If you always predict spam (output y=1), your classifier will have a recall of 0\% and precision of 99\%.
	
	
	\item WRONG If you always predict non-spam (output y=0), your classifier will have a recall of 0\%.
	
	
	\item CORRECT If you always predict spam (output y=1), your classifier will have a recall of 100\% and precision of 1\%.
	
	
	
	\item CORRECT If you always predict non-spam (output y=0), your classifier will have an accuracy of 99\%.
\end{itemize}

\[\begin{array}{|c|c|c} \hline 
&   predict T	& predict F\\ \hline
actual T &	&  \\ \hline 
actual F & &  \\ \hline
\end{array} \]


%--------------------------------------%
%----------------------------------------------------------%
\subsection*{Question 5.}

Which of the following statements are true? Check all that apply.

\begin{itemize}
\item[(i)] CORRECT Using a very large training set makes it unlikely for model to overfit the training data.
	
	
\item[(ii)] 	WRONG If your model is underfitting the training set, then obtaining more data is likely to help.
	
\item[(iii)] 	CORRECT The "error analysis" process of manually examining the examples which your algorithm got wrong	can help suggest what are good steps to take (e.g., developing new features) to improve your algorithm's performance.
	
	
	
\item[(iv)]	WRONG It is a good idea to spend a lot of time collecting a large amount of data before building your first version of a learning algorithm.
	
	
	
\item[(v)] WRONG After training a logistic regression classifier, you must use 0.5 as your threshold for predicting whether an example is positive or	negative.
\end{itemize}


\newpage

What is the classifier's recall  : 0.85
Accuracy : 0.095
%---------------------------------------------------------------%
Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.

Which are the two?


We train a learning algorithm with a large number of parameters (that is able to learn/represent fairly complex functions).

We train a learning algorithm with a small number of parameters (that is thus unlikely to overfit).

CORRECT The features x contain sufficient information to predict y accurately. (For example, one way to verify this is if a human expert on the domain can confidently predict y when given only x).

WRONG When we are willing to include high order polynomial features of x (such as x21, x22, x1x2, etc.).

%---------------------------------------------------------------%

CORRECT The classifier is likely to now have lower precision.

%---------------------------------------------------------------%


CORRECT If you always predict non-spam (output y=0), your classifier will have a Recall of 0%.

If you always predict non-spam (output y=0), your classifier will have an accuracy of 99%.

WRONG If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, and it will likely perform similarly on the cross validation set.

WRONG If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, but it will do much worse on the cross validation set because it has overfit the training data.

 A good classifier should have both a high precision and high recall on the cross validation set.

%------------------------------------------------------------%
Q5

CORRECT The "error analysis" process of manually examining the examples which your algorithm got wrong
can help suggest what are good steps to take (e.g., developing new features) to improve your algorithm's
performance.

CORRECT
Using a very large training set makes it unlikely for model to overfit the training data.


\end{document}
